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Abstract 



Classical fingerprinting associates with each string a shorter string (its fingerprint), such 



o 

(N 



that, with high probability, any two distinct strings can be distinguished by comparing their 
fingerprints alone. The fingerprints can be exponentially smaller than the original strings if 
, the parties preparing the fingerprints share a random key, but not if they only have access to 

uncorrelated random sources. In this paper we show that fingerprints consisting of quantum 
information can be made exponentially smaller than the original strings without any correla- 
tions or entanglement between the parties: we give a scheme where the quantum fingerprints 
are exponentially shorter than the original strings and we give a test that distinguishes any 
two unknown quantum fingerprints with high probability. Our scheme implies an exponential 
quantum/classical gap for the equality problem in the simultaneous message passing model of 
■ communication complexity. We optimize several aspects of our scheme. 

o 

(N 

1 Introduction 

Fingerprinting can be a useful mechanism for determining if two strings are the same: each string 
is associated with a much shorter fingerprint and comparisons between strings are made in terms of 
their fingerprints alone. This can lead to savings in the communication and storage of information. 

The notion of fingerprinting arises naturally in the setting of communication complexity (see 
| KN97fl ). The particular model of communication complexity that we consider in this paper is called 



the simultaneous message passing model, which was introduced by Yao [Yao79] in his original paper 
on communication complexity. In this model, two parties — Alice and Bob — receive inputs x and y, 
respectively, and are not permitted to communicate with one another directly. Rather they each 
send a message to a third party, called the referee, who determines the output of the protocol based 
solely on the messages sent by Alice and Bob. The collective goal of the three parties is to cause 
the protocol to output the correct value of some function f(x,y) while minimizing the amount of 
information that Alice and Bob send to the referee. 
For the equality problem, the function is simply 



f(x,y) 



1 if x = y 
if x ^ y. 



The problem can of course be trivially solved if Alice sends x and Bob sends y to the referee, who 
can then simply compute f(x,y). However, the cost of this protocol is high; if x and y are ra-bit 
strings, then a total of 2n bits are communicated. If Alice and Bob instead send fingerprints of x 
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and y, which may each be considerably shorter than x and y, the cost can be reduced significantly. 
The question we are interested in is how much the size of the fingerprints can be reduced. 

If Alice and Bob share a random 0(logn)-bit key then the fingerprints need only be of constant 
length if we allow a small probability of error; a brief sketch of this follows. A binary error- 
correcting code is used, which can be represented as a function E : {0, 1}™ — ► {0, l} m , where E(x) 
is the codeword associated with x G {0, 1}™. There exist error-correcting codes (Justesen codes, for 
instance) with m = cn such that the Hamming distance between any two distinct codewords E(x) 
and E(y) (with x ^ y) is at least (1 — 5)m, where c and 5 are constants. For the particular case 
of Justesen codes, we may choose any c > 2 and we will have 5 < 9/10 + l/(15c) (assuming n is 
sufficiently large). For further information on Justesen codes, see Justesen [ [Jus72f and Mac Williams 
and Sloane |MS77| , Chapter 10]. Now, for x G {0, l} n and i G {1, 2, . . . , m}, let Ei(x) denote the i th 
bit of E{x). The shared key is a random i G {1, 2, . . . , m} (which consists of log(m) G log(n) +0(1) 
bits). Alice and Bob respectively send the bits Ei(x) and Ei(y) to the referee, who then outputs 1 
if and only if Ei(x) = Ei(y). If x = y then Ei(x) = Ei(y), so then the outcome is correct. If x ^ y 
then the probability that Ei{x) = E{(y) is at most 5, so the outcome is correct with probability 
1 — 5. The error probability can be reduced from 5 to any e > by having Alice and Bob send 
0(log(l/e)) independent random bits of the codewords E{x) and E(y) to the referee. In this case, 
the length of each fingerprint is 0(log(l/e)) bits. 

One disadvantage of the above scheme is that it requires overhead in creating and maintaining 
a shared key. Moreover, once the key is distributed, it must be stored securely until the inputs are 
obtained. This is because an adversary who knows the value of the key can easily choose inputs x 
and y such that x ^ y but for which the output of the protocol always indicates that x = y. 

Yao [Yao79, Section 4.D] posed as an open problem the question of what happens in this model 
if Alice and Bob do not have a shared key. Ambainis [ Amb96|] prov ed that fingerprints of 0{y/n) 
bits suffice if we allow a small error probability (see also [ KNR9q , NS96 , BK97]). Note that in 
this setting Alice and Bob still have access to random bits, but their random bits may not be 
correlated. Subsequently, Newman and Szegedy [NS96| proved a matching lower bound of Q(^/n). 
Their result was generalized by Babai and Kimmel [BK97] to the result that the randomized and 
deterministic complexity can be at most quadratically far apart for any function in this model. 
Babai and Kimmel attribute a simplified proof of this fact to Bourgain and Wigderson. 

We shall consider the problem where there is no shared key (or entanglement) between Alice 
and Bob, but the fingerprints can consist of quantum information. In Section |2[ we show that 
0(log n)-qubit fingerprints are sufficient to solve the equality problem in this setting — an exponen- 
tial improvement over the y^-bound for the comparable classical case. Our method is to set the 2 n 
fingerprints to quantum states whose pairwise inner-products are bounded below 1 in absolute value 
and to use a test that identifies identical fingerprints and distinguishes distinct fingerprints with 
good probability. (It is possible to take the fingerprints to be nearly pairwise orthogonal, although 
the bound on the absolute value of the inner product between pairs of states is not directly related 
to the error probability of the fingerprinting method.) This gives a simultaneous message passing 
protocol for equality in the obvious way: Alice and Bob send the fingerprints of their respective 
inputs to the referee, who then executes the test to check if the fingerprints are equal or distinct. 
In Section [2|, we also show that the fingerprints must consist of at least f2(logn) qubits if the error 
probability is bounded below 1. 

In Sections |3| and ||, we consider possible improvements to the efficiency of the fingerprinting 
methods of Section |2[ In Section ||, we investigate the number of qubits required to contain 2™ 
fingerprints with pairwise inner product bounded in absolute value by any 5 < 1. In Section we 
consider the efficiency of tests that distinguish between k copies of pairs of indentical states and k 
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copies of pairs of states whose inner product is bounded in absolute value by any 5 < 1. 

Finally, in Section [| we consider a variation of fingerprinting with a shared quantum key, con- 
sisting of O(logra) shared Bell states (EPR-pairs). We observe that results in [ BCT99| imply that 
errorless (i.e., exact) fingerprinting is possible with 0(logn)-bit classical fingerprints in a particular 
context where achieving the same performance with only a classical shared key requires fingerprints 
of length Q(n). 

We assume the reader is familiar with the basic notions of quantum computation and quantum 
information — for further information we refer the reader to the book by Nielsen and Chuang [NC00|. 



2 Quantum fingerprinting without shared keys 

In this section, we show how to solve the equality problem in the simultaneous message passing 
model with logarithmic-length quantum fingerprints in a context where no shared key is available. 
The solution is quite simple and the fingerprints are exponentially shorter than in the comparable 
classical setting, where 0(y / n) bit fingerprints are necessary and sufficient (see the references in 
the introduction). The method that we present is based on classical error-correcting codes, though 
in a different manner than discussed in Section [l] since no shared key is available. 

Assume that for fixed c > 1 and 5 < 1 we have an error correcting code E : {0, l} n — * {0, l} m 
for each n, where m = cn and such that the distance between distinct codewords E(x) and E(y) is 
at least (1 — 8)m. As mentioned in Section [j], a reasonable first choice of such codes are Justesen 
codes, which give 8 < 9/10 + l/(15c) for any chosen c > 2. Now, for any choice of n, we define the 
(log(m) + l)-qubit state \h x ) as 

m 

IM = 7feE|i>IW) (1) 
i=i 

for each x S {0, l} n . Since two distinct codewords can be equal in at most 5m positions, for any 
x / y we have (h x \h y ) < 5m/m = 5. Thus we have 2 n different (log(n) + 0(l))-qubit states, and 
each pair of them has inner product at most 5. 

The simultaneous message passing protocol for the equality problem works as follows. When 
given n-bit inputs x and y, respectively, Alice and Bob send fingerprints \h x ) and \h y ) to the referee. 
Then the referee must distinguish between the case where the two states received — call them \<f>) 
and — are identical or have inner product at most 5. This is accomplished with one-sided error 
probability by the procedure that measures and outputs the first qubit of the state 

(H (g> /)(c-SWAP)(# (8) /)|0)|^>. 

Here H is the Hadamard transform, which maps 1 6) 4_(|0) + (— l) b | 1> ) , SWAP is the operation 
10)1^) ~~ * 1^)10) an d c-SWAP is the controlled-SWAP (controlled by the first qubit). The circuit 
for this procedure is illustrated in Figure |l[ By tracing through the execution of this circuit, one 
can determine that the final state before the measurement is 

anw>iv> + + iiww>- 

Measuring the first qubit of this state produces outcome 1 with probability | — \ K'/'IV')! 2 - This 
probability is if x = y and is at least ^(1 — 5 2 ) > if x y. Thus, the test determines which 
case holds with one-sided error |(1 + 8 2 ). 
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Figure 1: Circuit to test if \<j)) = \tj)) or |(<^|^)| < 5 

The error probability of the test can be reduced to any e > by setting the fingerprint of 
x S {0,1}™ to \h x )® k for a suitable k £ 0(log(l/e)). From such fingerprints, the referee can 
independently perform the test in Figure |l| k times, resulting in an error probability below e. In 
this case, the length of each fingerprint is 0((log n)(log(l/e)). 

It is worth considering what goes wrong if one tries to simulate the above quantum protocol 
using classical mixtures in place of quantum superpositions. In such a protocol, Alice and Bob send 
(i,Ei(x)) and (j,Ej(y)) respectively to the referee for independent random uniformly distributed 
i, j £ {1, 2, . . . , m}. If it should happen that i = j then the referee can make a statistical inference 
about whether or not x = y. But i = j occurs with probability only 0(l/n) — and the ability of 
the referee to make an inference when i ^ j seems difficult. For many error-correcting codes, no 
inference whatsoever about x = y is possible when i ^ j and the lower bound in [ |NS96 | implies that 



no error-correcting code enables inferences to be made when i j with error probability bounded 
below 1. The distinguishing test in Figure |l| can be viewed as a quantum operation which has no 
analogous classical probabilistic counterpart. 

Our quantum protocol for equality in the simultaneous message model uses 0(log n)-qubit 
fingerprints for any constant error probability. Is it possible to use fewer qubits? In fact, without 
a shared key, fi(log n)-qubit fingerprints are necessary. This is because any fc-qubit quantum state 
can be specified within exponential precision with 0(k2 k ) classical bits. Therefore the existence of a 
fe-qubit quantum protocol implies the existence of an 0(/c2 fc )-bit (deterministic) classical protocol. 
From this we can infer that k £ fi(logn). 

3 Sets of pairwise-distinguishable states in low-dimensional spaces 

In Section g, we employed a particular classical error-correcting code to construct a set of 2 n 
quantum states with pairwise inner products below 5 in absolute value. Here, we consider the 
question of how few qubits are sufficient for this to be accomplished for an arbitrarily small 5 > 0. 
We show that logn + 0(log(l/<5)) qubits are sufficient. While this gives somewhat better bounds 
than the Justesen codes discussed in Section ^, unfortunately we only have a nonconstructive proof 
of this fact. The proof follows. 

Suppose d > ■ Then we claim there are 2™ unit vectors in M. d with pairwise inner product 

at most 5 in absolute value. Consider two random vectors in v,w in {+l,-l} d /Vd. Suppose 
v and w agree in d' coordinates and disagree in d — d' coordinates, then their inner product is 



(v\w) = (2d' — d)/d. Using a Chernoff bound [ AS92 , Corollary A. 2] we have 



Pr[|(w|u>)| > S] = Pr[|2d' - d\ > Sd] < 2e- &2d / 2 . 
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Now pick a set S of 2 n random vectors from {+1, — l} d /Vd. The probability that there are distinct 
v,w £ S with large inner product is upper bounded by 

Pr[3 distinct v,w G S with |(«|u>)| > 5] < Pr[|(«|io)| > 5} 

distinct V,W£.S 

If d > 4n/5 2 loge then this probability is < 1, which implies the existence of a set S of 2™ vectors 
having the right properties. 

By associating {0, l} n with the 2 n vectors above, we obtain fingerprints of log(4n/<5 2 log e) 6 
logn + 0(log(l/<5)) qubits for any 5 > 0. 

Up to constant factors, the nonconstructive method above is optimal in the following sense. Let 
5 > 2~ n . Then an assignment of 6-qubit states to all n-bit strings such that the absolute value of 
the inner product between any two fingerprints is at most 5, requires b £ 0(log(n/(J)) qubits. In 
order to demonstrate this, we will prove and then combine two lower bounds on b. 

Firstly, the states can be used as fingerprints to solve the equality problem of communication 
complexity with bounded-error probability in one round of communication (Alice sends the finger- 
print of her input x to Bob, who compares it with the fingerprint of his y). Therefore the known 
lower bound for equality implies b > clogn for some c > 0. 

Secondly, pick a set of a = 1/5 different fingerprints. These are complex unit vectors v\, . . . ,v a 
of dimension 2 b , whose pairwise inner products are at most 5 in absolute value. Let A be the ax2 b 
matrix having the conjugated vectors Vi as rows and let B be the 2 b x a matrix having the V{ as 
columns. Consider the a x a matrix C = AB. Its i,j entry is Cij = {vi\vj), so the diagonal entries 
of C are 1, the off-diagonal entries are at most 5 in absolute value. This means that C is strictly 
diagonally dominant: Ca = 1 > (a — 1)5 > Ylj=/=i f° r an ^ ^ s known that such a matrix 



has full rank [ HJ85 , Theorem 6. 1.10. a]. This implies that the a vectors vi,... ,v a are linearly 
independent and hence must have dimension at least a. Thus 1/5 = a < 2 b , hence b > log(l/5). 
Since both lower bounds on b hold simultaneously, we have 

b > max{clogn,log(l/,5)} > clo § n +^ 1 °g( 1 /^) £ n (i og ( n / S )). 

It should be noted that having small inner product 5 is desirable but not all-important. For 
instance, there is a trade-off between 5 and the number of copies of each state sent by Alice and 
Bob in the simultaneous message passing protocol for equality from the previous section in terms 
of the total number of qubits communicated and the resulting error bound. 



4 The state distinguishing problem 

Motivated by the fingerprinting scheme of Section ^, we define the state distinguishing problem as 
follows. The input consists of k copies of each of two quantum states \<j)) and with a promise 
that the two states are either identical or have inner product bounded in absolute value by some 
given 5 < 1. The goal is to distinguish between the two cases with as high probability as possible. 

One method for solving this problem is to use the method in Section ^, independently perform- 
ing the test in Figure |l| k times, resulting in an error probability of in the identical case and 
{ l+ 2 ) k otherwise. We will describe an improved method, whose error probability is approximately 
V^k(^-^-) 2k (which is almost a quadratic reduction when 5 is small). We also show that this is 
nearly optimal by proving a lower bound of \Q-^-) 2k on the error probability. 
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The improved method for the state distinguishing problem uses registers i?i , . . . , i?2fc , which 
initially contain \<fi), . . . , \ <fr), . . . (k copies of each). It also uses a register P whose classical 
states include encodings of all the permutations in S^fc- Let denote the identity permutation and 
let P be initialized to 0. Let F be any transformation satisfying 

Such a transformation can easily be computed in polynomial time. 
The distinguishing procedure operates as follows: 



1. Apply F to register P. 

2. Apply a conditional permutation on the contents of registers Ri, 
permutation specified in P. 



, i?2fe j conditioned on the 



3. Apply Ft to P and measure the final state. If P contains then answer equal, otherwise 
answer not equal. 



The state after is Step 2 is 



-_ E kM|^> ■ ■ ■ |^> ■ ■ ■ |^» 

V W aes 2k 



(where a(\4>) ■ ■ ■ \<t>)\ip) ■ ■ ■ means we permute the contents of the 2k registers according to a). 

Case 1: |0) = \ip). In this case the permutation of the registers does absolutely nothing, so the 
procedure answers equal with certainty. 

Case 2: Assume |(</>|V')| < <5- The probability of answering equal is the squared norm of the vector 



obtained by applying the projection |0)(0| <8> I to the final state, which is 



Peg 



-_ E (0\F^\a)a(\<P)...\m)---m 



Since 



{r]\rj) for any \n) we may simplify this probability as follows: 



Peg 



({2k) 



H2 



((2k)\y 



(2k) 



E 

E <*i- 



(V|)r(|^)---|0)|V)"--|V')) 
'|a- 1 r(|^)---|0)|V)---|V» 



(fc!) 2 * ^ :j 
(2fe)l ^ 



i=o 



J2j 



< 



(2fc)! ( + } 
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The sum of binomial coefficients arises by grouping the permutations a according to the number 
of registers j in the set {Ri,... ,Rk} that a causes to contain \tp). We therefore have p eq ~ 

We now show that the error probability cannot be less than \ {^j^-) 2k for the state distinguishing 
problem.^ Consider an optimal state distinguisher that acts on k copies of \(f>) and k copies of 
where either \<f>) = \if)) or \((/)\tp)\ < S. Let |(/>i) = |^) = |0), and let \ = cos(|)|0) +sin(|)|l) 
and 1^2) = cos (|)|0) — sin(|)|l), where 9 = cos _1 ((5). Clearly, \(f>i) = and (^2 1 ^2) = 6. A state 
distinguisher must distinguish between the state \a) = |0i)® fc <8> an d the state \b) = |</>2)® fc <8> 

\ip2) ■ We now consider the probability with which a state distinguisher can distinguish between 
these states. Since (</>i\<fa) = (i/>i\ih) = cos(f), it follows that (a\b) = cos 2fc (f) = (±±f*l) k = 
Now, it is known that the optimal procedure distinguishing between two states with 
inner product cos a has error probability 1 ~ s 2 ma > \ cos 2 a. (This follows from an early result of 



Helstrom | ]rlel67 |, which was later strengthened by Fuc hs |Fuc95| , Section 3.2]. A clean and self- 



contained derivation of this result may also be found in [ PrcO0(| . )~Therefore, the state distinguisher 
must have error probability at least \{^-) 2k ■ 

5 Errorless fingerprinting using a shared quantum key 

Finally, we consider briefly the case of fingerprinting where Alice and Bob have a shared quantum 
key, consisting of O(logn) Bell states, but are required to output classical strings as fingerprints. 
Is there any sense in which a quantum key can result in improved performance over the case of a 
classical key? We observe that results in |BCT99| imply an improvement in the particular setting 
where the fingerprinting scheme must be exact (i.e., the error probability is 0) and where there is a 
restriction on the inputs that either x = y or the Hamming distance between x and y is n/2 (and 
n is divisible by 4). 

Under this restriction, any classical scheme with a shared key would still require fingerprints 
of length Q(n). On the other hand, there is a scheme with a shared quantum key of O(logra) Bell 



states that requires fingerprints of length only O(logn) bits. See [BCT99] for details (the results 



are partly based on results in [BCW98, FR87|). It should be noted that if the exactness condition 
is relaxed to one where the error probability must be 0(l/n c ) (for a constant c) then there exists 
also a classical scheme with classical keys and fingerprints of length O (log re). 
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